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ABSTRACT 

This work describes a unified approach to two problems pre- 
viously addressed separately in Intelligent Tutoring Systems: 
(i) Cognitive Modeling, which factorizes problem solving 
steps into the latent set of skills required to perform them 
[7]; and (ii) Student Modeling, which infers students’ learn- 
ing by observing student performance [9]. 

The practical importance of improving understanding of 
how students learn is to build better intelligent tutors [8]. 
The expected advantages of our integrated approach include 
(i) more accurate prediction of a student’s future perfor- 
mance, and (ii) clustering items into skills automatically, 
without expensive manual expert knowledge annotation. 

We introduce a unified model, Dynamic Cognitive Trac- 
ing, to explain student learning in terms of skill mastery 
over time, by learning the Cognitive Model and the Stu- 
dent Model jointly. We formulate our approach as a graph- 
ical model, and we validate it using sixty different synthetic 
datasets. Dynamic Cognitive Tracing significantly outper- 
forms single-skill Knowledge Tracing on predicting future 
student performance. 

1. INTRODUCTION 

We propose Dynamic Cognitive Tracing as a method that 
estimates from performance data: 

1. A Student model. The estimate of a student’s knowl- 
edge of a skill in a given time. 

2. A Cognitive Model. The skills a students require 
to solve a problem step. 

Let’s illustrate the student modeling problem with an ex- 
ample. Suppose we are interested in modeling data from a 
reading tutor that listens to children read aloud. Figure 1 
shows sample data in this scenario. We follow the convention 
of referring to the scorable steps in an intelligent tutor task 
as “items” [27]. The input variable is the item idt, which in 
this case is the word read by a student at time step t. The 
target variable pt is the performance of the student- in this 
case whether the tutor accepted the word read. The student 
reads the words “smile because it” correctly, but misreads 
the word “happened”. The student modeling problem is to 
predict future student performance. 

Existing student modeling techniques require cognitive mod- 
els, assignments of items to skills [9]. This is a very expen- 
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Figure 1: Reading tutor example of student modeling 


sive requirement, since it often depends on expert domain 
knowledge [4]. For example, in our reading tutor scenario, 
it is not a trivial endeavor to cluster a dictionary of words 
into the set of skills needed to read them. 

Unfortunately, the success of existing methods for auto- 
matic construction of cognitive models has been limited [11]. 
Current methods for discovering cognitive models are re- 
stricted in that they cannot handle longitudinal data, or 
that they are not fully automatic. For example, Princi- 
pal Component Analysis, Non-Negative Matrix Factoriza- 
tion [27] and the Q-Matrix Method [2] ignore the temporal 
dimension of the data. On the other hand, Learning Factors 
Analysis [7] is designed for temporal data, but it requires an 
expert’s cognitive model. Our main contribution is a fully 
automatic approach to discover a cognitive model of longitu- 
dinal student data. Our goal is discovering student models, 
while simultaneously clustering similar items together. 

The rest of this document is organized as follows. Sec- 
tion 2 reviews related prior work. Section 3 describes our 
approach, Dynamic Cognitive Tracing, to jointly learn a stu- 
dent model jointly with a factorization of items into skills. 
Section 4 evaluates performance using synthetic data. Sec- 
tion 5 provides some concluding remarks. 

2. RELATION TO PRIOR WORK 

In this section we study Dynamic Cognitive Tracing’s re- 
lation with prior work. Section 2.1 surveys previous ap- 
proaches to learn student models. Section 2.2 summarizes 
automatic approaches for cognitive model discovery. 
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2.1 Student Modeling 

Corbett and Anderson [9]’s seminal paper introduced Knowl- 
edge Tracing as a way to model students’ changing knowl- 
edge during skill acquisition. It uses (a) a cognitive model 
that maps a problem solving item to the skills required, 
and (b) logs of students’ correct and incorrect answers as 
evidence of their knowledge on a particular skill. Reye [22] 
showed that there is an equivalent formulation of Knowledge 
Tracing as a Bayesian Network. Knowledge Tracing has 
enabled significantly faster teaching by Intelligent Tutors, 
while achieving the same performance on evaluations [8]. 

Knowledge Tracing, as well as Dynamic Cognitive Tracing, 
are non-convex problems. This means that the optimizer 
that estimates the parameters of the models might get stuck 
in local optima far away from the global optimum. More- 
over, these formulations are also non-identifiable: There 
exist potentially many student models that may explain 
the data observed equally-well. In Knowledge Tracing, the 
main source of non-identifiability is the trade-off between the 
probability of a student’s initial knowledge, and the proba- 
bility of learning the skill [5] . To mitigate non-identifiability, 
recent work has proposed the use of Bayesian priors [5] or 
using contextual clues to estimate whether a student has 
guessed [1]. 

Other approaches to student modeling include Performance 
Factor Analysis [19, 14], which predicts student performance 
based on the item difficulty and student historical perfor- 
mances. Alternatively, Learning Decomposition [6], uses 
non-linear regression to determine how to weight different 
types of practice opportunities relative to each other. More 
recently, Tensor Factorization [25] , has been used to the stu- 
dent modeling problem. It use recommender system tech- 
niques to learn student models. None of these techniques 
aim to discover cognitive models. Thai-Nghe et al. [25] make 
use of latent variables, but they argue that it is not possible 
to interpret their semantics. Their formulation is tied to 
specific students, and it is not clear how to generalize their 
approach to unseen students in the training set, or when stu- 
dents encounter only a very sparse set of items. We designed 
Dynamic Cognitive Tracing aiming to discover latent factors 
with the interpretation of Cognitive and Student Models. 

Desmarais [11] argues that the construction of a cognitive 
model from data is highly desirable, not only to avoid the 
labor intensive task of specifying which skills are involved 
in which task, but because a data-driven approach might 
outperform human judgment. In the next subsection we 
study such approaches. 

2.2 Automatic Discovery of Cognitive Models 

Winters et al. [27] surveyed methods for automatic con- 
struction of cognitive models. Examples are matrix factor- 
ization techniques, such as Principal Component Analysis 
(PCA) and Non-Negative Matrix Factorization (NNMF). 
The theoretical relationships between different matrix fac- 
torization techniques has been studied in detail [24]. 

The Q- matrix algorithm [2, 3], is a hill-climbing method 
that creates a cognitive model linking skills and items di- 
rectly from student response data. An alternative approach, 
Learning Factors Analysis [7], performs combinatorial search 
to evaluate and improve on existing cognitive models. 

None of the techniques reviewed in this section take into 
account the temporal dimension of the data without human 


intervention. To the extent of our knowledge, we are the first 
ones to estimate a cognitive model completely automatically 
from data collected over time. 

3. DYNAMIC COGNITIVE TRACING 

We now describe Dynamic Cognitive Tracing. Subsection 3.1 
details our approach. Subsection 3.2 provides pointers on 
the training and inference algorithms used. Subsection 3.3 
shows how Dynamic Cognitive Tracing relates two common 
techniques used in student modeling and in automatic gen- 
eration of a cognitive model. 

3.1 Model 

We formulate Dynamic Cognitive Tracing as a Bayesian 
Network. Bayesian Networks [20], are a popular framework 
to reason using noisy information. Bayesian networks are 
directed acyclic graphical models where the nodes are vari- 
ables and the edges specify statistical dependencies between 
variables. 

Bayesian Networks are often described using plate diagram 
notation to show the statistical relationship between their 
random variables. The plate diagram of Dynamic Cogni- 
tive Tracing is shown in Figure 2(a). Instead of drawing 
a variable multiple times, we follow the convention of us- 
ing a plate to group repeated variables. As an example, we 
unroll Dynamic Cognitive Tracing using two skills in Fig- 
ure 2(b). The description of the generative story of the vari- 
ables is described in Figure 3. We follow the convention of 
using dark-gray to color variables that are observable during 
both training and testing. Variables visible during testing 
only are colored in light gray. Latent variables, which are 
never observed, are denoted in white circles. The double- 
line around variables is used to indicate that their value is 
calculated deterministically given its parents. The variables 
in Dynamic Cognitive Tracing are: 

• S is the number of skills in the model. 

• Ids is the number of items that the student can prac- 
tice with the tutor. For example, in the case of a read- 
ing tutor, Ids is the vocabulary size. If the tutor is 
creating items on the fly, Ids is the number of tem- 
plates from where items are being generated. 

• Q is an Id x S matrix that maps items to skills. Each 
row Qid is modeled as a multinomial representing the 
skills required for item id. For example, if Q t d, = 
[0.5, 0.5, 0, 0], we interpret item idt to be a mixture of 
skills 1 and 2. In this example idt does not require 
skills 3 and 4. Q need not be hidden. If in fact Q is 
known, we can clamp the parameters to their known 
values. 

• qt is the skill for item idt ■ For example, qt = 1 iff skill 
1 is required for item id t , qt = 2 iff skill 2 is required, 
and so on. qt is chosen deterministically as the row 
number idt. of Q. 

• K St t indicates whether the student has the knowledge 
of skill s. Notice, there is a markovian dependency 
across time steps: if skill s is known at time t — 1, it is 
likely to be known it at time t. Therefore, we also need 
to know which skills were active on the previous time 
step (i.e., k s ,t depends on qt.-i). For simplicity, in this 
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(a) Plate diagram (b) Unrolled example with two skills 

Figure 2: Dynamic Cognitive Tracing as a graphical model 


work we treat each K as a binary variable (whether 
the skill is known or not). 

• k s ,t is a binary variable that represents if the skill is 
known and required by the item idt . Hence, its value is 
computed deterministically by applying a dot product 
to its parents: k s ,t is true iff skill s is required (qt = s), 
and the student has learned the skill (K s ,t = 1). 

• pt is the target variable that models performance. It 
is only observed during training. 

— For discrete grades (i.e. , right or wrong), a Bi- 
nomial distribution or logistic regression can be 
used. The use of logistic regression in Bayes- 
ian Networks has been studied in the context of 
mixture of experts [16], and more recently for 
the multiple subskill problem in student model- 
ing [28]. In this paper we use the Binomial ap- 
proach. 

— For continuous grades, (i.e., 0 ~ 100) linear re- 
gression can be used. 

Our main contribution is unsupervised estimation of the 
cognitive model Q from longitudinal data, while simultane- 
ously estimating the student model parameters. In the next 
subsection we study how to learn the parameters of Dynamic 
Cognitive Tracing, as well as how to perform inference on 
it. 

3.2 Training and Inference 

Dynamic Cognitive Tracing is formulated as a directed graph- 
ical model (Bayesian Network). We leverage existing tech- 
nologies to quickly implement a prototype of Dynamic Cog- 
nitive Tracing. We used the Bayesian Network Toolkit [18] 
(BNT) for Matlab. 

As described in the previous subsection, the knowledge of 
a skill is dependent of its value on the previous time step. 
This kind of dependency is called a Markov Chain. There- 
fore, in Dynamic Cognitive Tracing, the student knowledge 


1. Draw Qid ~ Multinomial: Ids times 

2. For each time-step t £ {0 . . . T}: 

(a) Draw idt ~ Multinomial 

(b) For each skill s £ [0...S] : 

(c) Set q s ,t <- Qid t 

(d) Draw A' s , t ~ Binomial 

(e) Set k s , t <- I< qt ■ q s , t 

(f) pt ~ A/Xfci, t, fe,t, •••, ks,t), for continuous p, or 
for binary variables either 

pt ~ logistic(ki,t, fe.t, • • • , ks,t), or p t ~ Binomial 


Figure 3: Generative story of Dynamic Cognitive Tracing 


of S skills is modeled using S layers of Markov Chains. Un- 
fortunately, this is not scalable, because exact inference on 
layers of Markov Chains that produce a single output is un- 
tractable: the runtime complexity grows exponentially on 
the number of layers [12]. Hence, we limit our study to a 
small number of skills. In future work we will implement 
inference techniques that scale better, like Gibbs Sampling. 

The name Bayesian Network is a misnomer, because it does 
not require to use Bayesian Estimation, as in fact, we used 
Maximum Likelihood Estimation to perform exact inference. 
BNT implements the Junction Tree algorithm [15], an infer- 
ence algorithm that generalizes the the Forward-Backward 
algorithm that is used in Knowledge Tracing and Hidden 
Markov Models [21]. To estimate the parameters of the 
model, we use the Expectation-Maximization (E-M) algo- 
rithm [10]. Like all non-convex optimizaters, E-M is not 
guaranteed to find the globally optimal solution. 

3.3 Unifying Perspective 

We now discuss how Dynamic Cognitive Tracing generalizes 
two common techniques for cognitive and student modeling. 
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Figure 4: Two-skill models with one time step 




( K, X Ki ) 

X. x' 

■X@ 

X 

( k, W k 2 ) - 

-vX 

;(kT)) X]) 

©) 

X X 

X 







P X XX 

XX 

X‘J XX 

XX 

(a) Knowledge 

Tracing 

(b) Dynamic 
Tracing 

Cognitive 


Figure 5: Unrolled graphical model representation of one- 
skill student models 


Cognitive models have been built by matrix factorization 
techniques [27]. Probabilistic Principal Component Analy- 
sis (PPCA) [26] is an example of such matrix factorization 
techniques. It is a formulation of the Principal Component 
Analysis algorithm using graphical models. The main ad- 
vantages of this approach over conventional PCA, is that 
it can handle missing data, and it provides a probabilistic 
interpretation of the underlying factors. 

In Figure 4(a) we show the graphical model representa- 
tion of PPCA when explicitly formulated to handle missing 
data. If the variable p is continuous, it is modeled with a 
Gaussian. If the variable p is discrete, it is model with a Bi- 
nomial, using a logistic link function. Discrete PCA is also 
known in the literature as Logistic PCA [23]. Figure 4(b) 
shows the simplified Dynamic Cognitive Tracing with two 
skills, when there is no temporal information available. The 
structure of both graphical models is very similar: in both 
cases, the performance is explained by latent variables that 
represent the skills. The main difference is that Dynamic 
Cognitive Tracing takes into account the knowledge of the 
skill estimated from the student model: the performance is 
explained by the latent knowledge of the skills. We hypoth- 
esize that the advantage of our approach lies in the fact that 
it is not limited to a single timestep like PPCA is. We ex- 
pect that item-performance data to be very noisy, and that 
the temporal information would be useful to model skill ac- 
quisition. 

Figure 5(a) shows the graphical model representation of 


Knowlege Tracing with a single skill model, which is just 
a Hidden Markov Model. Figure 5(b) shows the unrolled 
single-skill Dynamic Cognitive Tracing (S = 1) counterpart. 
In this case the structure of Dynamic Cognitive Tracing is 
equivalent to Knowledge Tracing. 

4. EMPIRICAL EVALUATION 

In this section, we report results of using Dynamic Cog- 
nitive Tracing to predict future student performance using 
synthetically generated datasets. In the context of this pa- 
per, we decouple the problem of discovering the assignments 
of items to skills and the problem of discovering the num- 
ber of skills. For our experiments, we assume the number of 
skills is known. In a real scenario, where the number of skills 
is unknown, it could be estimated by using cross-validation 
using a held-out set. We report our results using Dynamic 
Cognitive Tracing using the true number of skills. 

Dynamic Cognitive Tracing aims to discover the skills au- 
tomatically without supervision. We compare if the cogni- 
tive model estimated by Dynamic Cognitive Tracing out- 
performs a cognitive model that assigns all of the items to 
a single skill. Therefore, as a baseline, we compare against 
Knowledge Tracing using a single skill. 

In all comparisons between Knowledge Tracing and Dy- 
namic Cognitive Tracing, their parameters are estimated us- 
ing the same training set. The testing and training sets do 
not overlap students. 

4.1 Experimental setup 

In this section, we describe the synthetic data sets gener- 
ation criteria and the evaluation metrics. To generate the 
synthetic data sets, we use the generative story described 
in Figure 3, having each student encounter 25 items dur- 
ing training (sequence length = 25). In preliminary experi- 
ments, we noticed that by the 25 th time step, most synthetic 
students learned. To have a more balanced test set that has 
roughly the same number of correct and incorrect answers, 
the sequence length of the test set is sampled randomly. 

We want synthetic data to be plausible; for example, the 
probability of answering an item correctly by guessing should 
be lower than the probability of answering an item correctly 
due to knowledge. Therefore, the synthetic datasets follow 
these constraints: 

• The learning probability, the probability of transition- 
ing from not knowing a skill, to knowing it, lies in 
[0.01... 0.45]. 

• The guess probability, the probability of answering cor- 
rectly, given that the student does not know the skill, 
lies in [0.01 ... 0.30]. 

• The slip probability, the probability of answering in- 
correctly, given that the student knows the skill, lies 
in [0.01... 0.30]. 

Note that these constraints are only exercised for gener- 
ating the data. None of our models make use of this prior 
knowledge. For simplicity, in this paper we limit studying 
cognitive models that have only one skill active per item, but 
Dynamic Cognitive Tracing does not make use of this infor- 
mation. We constrain the models to not learn the “forget 
probability” (e.g., the transition probability from “knowing” 
to “not knowing” is zero). 
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Knowledge Tracing can sometimes provide bad parameter 
estimates. Beck and Chang [5] argued that when Knowl- 
edge Tracing performs badly, it is often because of incorrect 
estimation of the initial knowledge of the students (initial 
probabilities). We want to make sure that our results are 
better than Knowledge Tracing because of the strengths of 
Dynamic Cognitive Tracing, not because Knowledge Trac- 
ing got stuck in an “unlucky” local optimum. Therefore, we 
constrain all of the students to not have any initial knowl- 
edge in our experiments. 

E-M is used to learn the parameters of the models. Knowl- 
edge Tracing and Dynamic Cognitive Tracing are initialized 
with random parameters, however, the emission probabil- 
ities (slip and guess probabilities) of Dynamic Cognitive 
Tracing are initialized using a single-skill model. We ex- 
periment running E-M using five different random initial- 
izations. 

Unless noted otherwise, each dataset is divided in three 
parts: (i) a training set with 200 students, (ii) a development 
set with 50 students, used to choose the best out of five 
random initializations of the E-M algorithm, and (iii) a test 
set with 50 students. Students do not overlap among the 
sets. 

We report the performance of our models using two met- 
rics: 

• Average Per-item Likelihood. Likelihood is a com- 
mon metric to evaluate models that find latent struc- 
ture [12]. It measures how likely a model is to predict 
the test set. It penalizes more heavily incorrect pre- 
dictions with high-confidence. More formally, let / be 
the number of students in the test set, let pi,t be the 
estimated performance of student i at time t, let pi :t be 
the real performance of the student and let Ti be the 
number of time steps for student i. Then we compute 
the per-item likelihood as: 

I Ti 

^^ pr (Pi,t = Pi,t\Pi,t-i,idi, t ) 
i t 

i 

• Classification Accuracy. Classification accuracy mea- 
sures how often the predicted performance matches the 
actual performance. Formally, let S(-) be the Indicator 
function that returns 1 iff its argument is true, and 0 
otherwise. We compute the accuracy as: 

I Ti 

^^<5(p r (Pi,t = pi,t\pi,t-uid%,t) > 0.5 

i t 

i 

In the next section, we report all of the different parameter 
combinations of parameters we used to experiment. We did 
not perform any additional tuning besides the one reported 
in the next section. 

4.2 Results 

We create a total of 60 random synthetic datasets using 
the constraints explained in Section 4.1. All of them have 


Dataset likelihood 



Knowledge Tracing 

Figure 6: Average Likelihood of Dynamic Cognitive Tracing 
and single-skill Knowledge Tracing in 60 different data sets 

Table 1: Dynamic Cognitive Tracing’s worst performing 
dataset (highlighted in Figure 6) 



Skill 1 

Skill 2 

Learning probability: 

.35 

.30 

Slip probability: 

.09 

.08 

Guess probability: 

.02 

.11 


four types of items (Ids = 4). We created twenty datasets 
with 2, 3 and 4 skills (S = 2,3,4), respectively. 

In Figure 6, the horizontal axis denotes the Likelihood of 
single-skill Knowledge Tracing. The vertical axis is the Like- 
lihood of Dynamic Cognitive Tracing. The solid line divides 
the datasets in which Dynamic Cognitive Tracing performed 
better than Knowledge Tracing (upper left corner) and the 
ones in which it performed worse (lower right corner). The 
dotted lines represent the confidence interval for the mean 
of the Likelihood of Knowledge Tracing. Dynamic Cognitive 
Tracing performs as well or above the baseline in a total of 
52 (87%) of the datasets. 

Is estimating a cognitive model with Dynamic Cognitive 
Tracing better than assuming a single skill model? We com- 
pare the mean Likelihood of Dynamic Cognitive Tracing 
(Sdct = 62.34, sdct = 5.13), with the mean Likelihood of 
single-skill Knowledge Tracing (xkt = 59.97, skt = 5.18). 
The null hypothesis is that the mean Likelihood of both 
models is the same (Ho : Pdct = Pkt)- We perform a 
two-tailed t-test, pairing on the datasets (n=60). We reject 
the null hypothesis Ho with confidence p < 0.05. We con- 
clude that Dynamic Cognitive Tracing outperforms Knowl- 
edge Tracing with a single skill assumption. 

In Figure 6 the arrow points to the dataset that performs 
the worst compared to the single-skill Knowledge Tracing 
baseline. The Likelihood of the true model is 65%, of Dy- 
namic Cognitive Tracing is 57%, and of single-skill Knowl- 
edge Tracing is 61%. We now investigate why Knowledge 
Tracing outperforms Dynamic Cognitive Tracing on this spe- 
cific dataset. Table 1 shows the parameters of the student 
model. We notice that both skills’ learning and slip prob- 
abilities are very similar. We run the E-M algorithm using 
100 different random initializations for both Dynamic Cog- 


Proceedings of the 5th International Conference on Educational Data Mining 


53 


Empirical CDF 



Likelihood L 

- ■ K.T. D.C.T True Modell 


Figure 7: Cumulative Distribution Function of the Like- 
lihood over 100 restarts (using the dataset highlighted in 
Figure 6) 


Table 2: Model Comparison Over Number of Skills 



2 skills 

3 skills 

4 skills 


Acc. 

Lik. 

Acc. 

Lik. 

Acc. 

Lik. 

True model 

.75 

.64 

.75 

.61 

.76 

.62 

DCT 

.74 

.63 

.73 

.62 

.73 

.62 

KT(1 skill) 

.71 

.61 

.69 

.59 

.70 

.60 

Majority 

.63 

- 

.66 

- 

.67 

- 


nitive Tracing and Knowledge Tracing. We use the same 
training set used for the highlighted dataset of Figure 6. To 
ensure more reliable results, we use a larger test set of 200 
students (instead of 50 students). Figure 7 shows the Cu- 
mulative Distribution Function of the Likelihood over 100 
random initializations. For a specific Likelihood £ in the 
horizontal axis, the vertical axis is the percentage of initial- 
izations with Likelihood found at a value less than or equal 
to t. Figure 7 shows that the Likelihood of the true model 
is 62.6%. The best Likelihood of Dynamic Cognitive Trac- 
ing is 61.1%, and of single-skill Knowledge Tracing is 59.7%. 
Knowledge Tracing gets stuck in local optima in less than 
5% of the restarts. On the other hand, for this dataset, 
Dynamic Cognitive Tracing gets stuck in local optima 99% 
of the time. While there is a Dynamic Cognitive Tracing 
solution that outperforms Knowledge Tracing, the E-M al- 
gorithm found it in 4% of the initializations. 

In Table 2, we aggregate the results of Figure 6. We re- 
port the mean performance of the parameters that generate 
the 60 synthetic data sets (True model), Dynamic Cogni- 
tive Tracing, single-skilled Knowledge Tracing (KT), and 
the classifier that always predicts the majority class (Major- 
ity). We present the mean Classification Accuracy and the 
mean Likelihood. Dynamic Cognitive Tracing has a similar 
Likelihood and Classification Accuracy to the True Model 
and dominates Knowledge Tracing. 

Let’s study a sample cognitive model estimated using Dy- 
namic Cognitive Tracing. Here Q* is the True Model’s cog- 
nitive model from which the synthetic data was generated. 
An estimate Q, learned from data using our approach is: 
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Figure 9: Classification accuracy using different training set 
sizes 


The estimated cognitive model has some uncertainty, but 
if we round Q to integer values, it matches Q* . In future 
work, we are interested in using Bayesian priors to encourage 
sparse entries in Q [13]. Bayesian estimation is not currently 
supported by the BNT toolkit in which we implemented our 
model. 

In Figure 8 we show how long it took to perform a single 
restart of Dynamic Cognitive Tracing and Knowledge Trac- 
ing. Although Dynamic Cognitive Tracing achieves better 
accuracy, its exact inference implementation does not scale 
well with the number of skills. 

We now try to simulate the effect of different amount of 
training data. For this, we experiment with 50, 100, 200 and 
400 students. We observed that in the PSLC DataShop [17], 
a repository for student data sets, it is common for smaller 
datasets to have data from at least 50 students. We assess 
the performance of our approach using ten synthetic training 
sets with different number of students. For all experiments 
here, we used four different types of items (Ids = 4), and 
two skills (S = 2). In Figure 9, the “True model” line rep- 
resents the classification accuracy of the model using the 
parameters from where the synthetic data was generated. 
The Knowledge Tracing line shows the performance of this 
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Table 3: Model Comparison Over Number of Items 



Ids 

= 4 

Ids 

= 8 

Ids= 

=12 


Acc. 

Lik. 

Acc. 

Lik. 

Acc. 

Lik. 

True model 

.77 

.66 

.70 

.60 

.73 

.61 

DCT 

.76 

.65 

.67 

.58 

.66 

.57 

KT(1 skill) 

.73 

.63 

.64 

.56 

.66 

.57 

Majority 

.66 

- 

.59 

- 

.58 

- 


approach, using a single skill. The results suggest that the 
approaches compared can achieve good performance even on 
a smaller datasets. 

Since we are actually clustering similar items into skills, 
the number of different items (Ids) may have an impact on 
the performance of our approach. We create ten sets with 4, 
8 and 16 item types respectively (Id = 4, 8, 16). All of them 
have two skills (S = 2). In Table 3, we summarize the Like- 
lihood and the Classification Accuracy of different models. 
The true model’s parameters achieve the highest likelihood, 
followed by our approach, that dominates Knowledge Trac- 
ing. 


5. CONCLUSION 

We propose Dynamic Cognitive Tracing as a novel unified 
approach to two problems previously addressed separately in 
Intelligent Tutoring Systems: (i) Student Modeling, which 
infers students’ learning by observing student performance 
[9], and (ii) Cognitive Modeling, which factorizes problem 
solving steps into the latent set of skills required to perform 
them [7]. 

We provide empirical results using synthetic data support- 
ing that our unsupervised approach is better than assuming 
that all items come from the same skills. Dynamic Cognitive 
Tracing significantly outperforms Knowledge Tracing using 
a single skill assumption. 

We used the Bayesian Networks Toolkit to quickly proto- 
type our approach. However, our prototype is limited in 
that (i) the inference algorithm used by the toolkit leads 
to complexity exponential in the number of skills, and (ii) 
the optimization algorithm gets stuck in local optima. We 
recommend implementing Dynamic Cognitive Tracing using 
approximate inference as future work. 

For simplicity, in this paper we limited our study to syn- 
thetic data of items that require a single skill. However, 
our formulation is capable of discovering items that require 
multiple skills. It is an empirical question that we leave 
for future work to understand how well Dynamic Cognitive 
Tracing performs in this context. 

We are also interested in comparing Dynamic Cognitive 
Tracing to other automatic methods that produce cognitive 
models from data, such as matrix factorization techniques 
[27]. An interesting alternative we leave unexplored is find- 
ing a cognitive model by first clustering items into skills, 
and then using Knowledge Tracing with the discovered cog- 
nitive model. However, it is not clear how to learn the skill 
clustering from data that comes at different points of time. 
For example, it is not obvious how PCA could be applied to 
temporal data. To our knowledge, we are the first ones to 
propose a fully-unsupervised method that combines student 
modeling with discovering a cognitive model. 
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